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This paper introduces a new class of robust estimates for ARMA 
models. They are M-estimates, but the residuals are computed so the 
effect of one outlier is limited to the period where it occurs. These 
estimates are closely related to those based on a robust filter, but 
they have two important advantages: they are consistent and the 
asymptotic theory is tractable. We perform a Monte Carlo where we 
show that these estimates compare favorably with respect to standard 
M-estimates and to estimates based on a diagnostic procedure. 

1. Introduction. There are two main approaches to deal with outliers 
when estimating ARMA models. The first approach is to start estimating 
the model parameters using maximum likelihood and then analyzing the 
residuals with a diagnostic procedure to detect outliers. Among others, di- 
agnostic procedures for detecting outliers were proposed by Fox [9], Chang, 
Tiao and Chen [4] , Tsay [23] , Pefia [22] and Chen and Liu [5] . However diag- 
nostic procedures suffer from the masking problem: when there are several 
outliers that have similar effects, the outliers may not be detected. 

A second approach is to use robust estimates, that is, estimates that 
are not much influenced by outlying observations. A detailed review of ro- 
bust procedures for ARMA models can be found in Chapter 8 of Maronna, 
Martin and Yohai [16]. In that chapter, it is shown that in the case of an 
AR(p) model, one outlier at observation t can affect the residuals corre- 
sponding to periods t' , t < t' < t + p; in the case of an ARMA(p, model 
with q > 0, it can affect all residuals corresponding to periods t' > t. For 
this reason estimates based on regular residuals (e.g., M- or S-estimates) 
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are not very robust. One way to improve the robustness of the estimates 
is to compute the residuals using the robust filter introduced by Masreliez 
[20]. This robust filter approximates the one-step predictor in ARMA mod- 
els with additive outliers. Several authors have proposed estimates that use 
residuals computed with the Masreliez filter. For instance, Martin, Samarov 
and Vandaele [19] proposed filtered M-estimates, Martin and Yohai [18] fil- 
tered S-estimates and Bianco et al. [1] filtered r-estimates. However, we can 
mention two shortcomings of the estimates based on filtered residuals. First, 
these estimates are asymptotically biased. Second, there is not an asymptotic 
theory for these estimators, and therefore inference procedures like tests or 
confidence regions are not available. 

In this paper, we propose a new approach to avoid the propagation of the 
effect of one outlier when computing the innovation residuals of the ARMA 
model: we define these residuals using an auxiliary model. For this pur- 
pose we introduce the bounded innovation propagation ARMA (BIP-ARMA) 
models. With the help of these models, we are able to define estimates for 
the ARMA model that are highly robust when the series contains outliers. 

We show that the mechanisms of the proposed estimates to avoid the 
propagation of the outliers are similar to those based on robust filters. How- 
ever, the advantage of these estimates over those based on the robust filters 
is that they are consistent and asymptotically normal under a perfectly ob- 
served ARMA model. 

The proposed estimates can be considered as a generalization of the MM- 
estimates introduced by Yohai [25] for regression. In the first step we define 
a highly robust residuals scale and in the second step we use a redescending 
M-estimate, which uses this scale. 

For brevity's sake, we have omitted in this paper some of the proofs. All 
the proofs can be found in Muler, Peha and Yohai [21]. 

The paper is organized as follows. In Section 2 we introduce the new 
family of models and show that the corresponding residuals are similar to 
those obtained with a robust filter. In Section 3 we introduce the proposed 
estimates. In Section 4 we establish the main asymptotic results: consistency 
and asymptotic normality. In Section 5 we discuss the computation of the 
proposed estimates. In Section 6 we discuss robustness properties of the 
proposed estimates. In Section 7 we present the results of a Monte Carlo 
study. In Section 8 we show the performance of the different estimates for 
fitting a monthly real series. In Section 9 we make some concluding remarks. 
Section 10 is an Appendix with the main proofs of the asymptotic results. 
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2. A new class of bounded nonlinear ARMA models. 

2.1. BIP-ARMA models. We are going to consider a stationary and in- 
vertible ARMA model that can be represented by 

(1) cl){B){xt-fi) = e{B)at, 

where at are i.i.d. random variables with symmetric distribution and where 
<j){B) and 9{B) are polynomial operators given by 4>{B) = 1 — X]f=i cind 
0(B) = 1 — X]i=i OiB^ with roots outside the unit circle. 

If at has first moment we have that E{xt) = /i- Let \{B) = (j)~^{B)6{B) = 
1 + ^iB^ cind consider the MA(cx)) representation of the ARMA process 

oo 

(2) xt = + at + ^Xittt-i. 

i=l 

We can model contaminated ARMA processes with a fraction e of outliers 

by 

(3) zf = (1 - C!)xt + C!wt, 

where xt is the ARMA model, wt is an arbitrary process and Q is a process 
taking values and 1 such that lim^^oo l/^(Z)r=i Ct) = ^- example, Q 
may be a stationary process such that E{Q) = e. The case of additive outliers 
corresponds to wt = xt + ft, where xt and vt are independent processes. 
Replacement outliers correspond to the case that the processes xt and wt 
are independent. According to the dependence structure of the process Q , 
we can have additive outliers or patchy outliers. For detail, see Martin and 
Yohai [17]. Robustness is related to the possibility of accurately estimating 
the parameter of the central model xt when we observe the contaminated 
process zf. 

Another type of outlier are innovation outliers. An ARMA process with 
innovation outliers occurs when we observe an ARMA process satisfying (1) 
but the innovations at have a heavy-tailed distribution. Regular M-estimates 
can cope with this type of outlier. See for example Maronna, Martin and 
Yohai [16]. 

We will use the following family of auxiliary models 

oo 

(4) = A* + «i + X! '^i'^'^ 

i=l 

where the at's are i.i.d. random variables with symmetric distribution and 
0" is a robust M-scale of at, which coincides with the standard deviation in 
the case that the at's are normal, the Aj's are the coefficients of (j)~^{B)9{B) 
and rj{x) is an odd and bounded function. An M-scale of oj is defined as 
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the solution of the equation E{p{at/cr)) = b. We call this model the bounded 
innovation propagation autoregressive moving average model (BIP- ARMA) . 

To obtain robust and efficient estimates we will choose 77 bounded and 
such that there exists k with r]{x) = x for |x| < k. More details on how to 
choose p, b and i] are given in Sections 3.1 and 6. Note that in this model 
the lag effect of a large innovation in period t has a bounded effect on yt+j 
for any j > and, since Xj — > exponentially when j ^ 00, this effect will 
almost disappear in a few periods. 

Note that (4) can also be written as 

yt = fi + at- av(^^^ + act>-\B)9{B)r^(^'^ 
and multiplying both sides by (l){B), we get 

<l^{B)yt = p(^l-j2<P^^+ HB)at - a<P{B)ri(^^^ + a9{B)rj(^^^ 
which is equivalent to 

P r , 

(5) = Of + /i + E (\>i{yt-i - A*) - X! ( '^i^t-i + ~ '\>i)^ri 

1=1 i=l ^ 

where r = max(p, q). If r > p, (/>p+i = ■ • • = (\)r = ^ and if r > Qqj^\ = ■ ■ ■ = 
9r = 0. 

2.2. Robust filters and BIP-ARMA models. Let us analyze the relation- 
ship of the BIP-ARMA model and an ARMA model with additive outliers. 
The BIP-ARMA model can be also be written as yt = {1 — Ct)xt + Cti^t + ^'t), 
where xt = yt — at + a* is an ARMA model satisfying (f){B){xt — p) = 6{B)a^, 
al = crr]{at/a), ut = at — a^, = I{\at\ > k) and e = P{\at\ > k). However, 
in the BIP-ARMA model, Q and vt are not independent and they are also 
not independent of xt- 

We will show that the one-step forecast in the BIP-ARMA model is similar 
to the forecast obtained by using the robust filter for ARMA models intro- 
duced by Masreliez [20]. The Masreliez filter was proposed as an approxima- 
tion to one-step predictors for additive models of the form (3), where xt is a 
Gaussian ARMA model, Q are i.i.d. Bernoulli variables with P{Ct = 1) = £ 
and Uf are i.i.d. normal random variables. 

Suppose we have an ARMA series and we suspect that it is 

contaminated with additive outliers. Assume first that we know the param- 
eters (f), 9, ji and a of the ARMA model. The robust filter computes a "clean" 
series y* , and filtered innovation residuals at that are obtained by the follow- 
ing recursive procedure. Suppose the cleaned series yf, . . . ,yf_i, and the fil- 
tered innovation residuals Si, . . . , a^-i previous to time t are computed. Since 
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= M - E~i T^iiVt-i - Ai) + at, where 7r(i?) = = 1 + E^=l T^iB\ 

the one-step ahead robust forecast of yt is obtained by replacing the yt-iS 
by the cleaned values yj_j's, that is, the one-step robust forecast of yt is 
obtained by 



oo 

(6) = ^ - E ^^(yU - ^) = - {0{Br^<l>{B) - l){yl - ^i), 

=1 



where yt = IJ- for t <0. The filtered innovation residual for period t is com- 
puted by a* = yt — yl and the cleaned value y^ by 



(7) yl = Tt + StT]* i^fj =yt- a* + stv* (^^ 

where st is an estimate of the one-step prediction error scale and where 
r]* has the same properties as those stated for 77, in particular for some 
A; > it holds that r]*{u) = u for \u\ < k. Observe that if \a*\ < k, then 
strj*(al/st) =CLt ^"^^ Vt — Vt- Recursive formulae for st can be found in 
Martin, Samarov and Vandaele [19]. 

We can easily derive from (6) and (7) that 

(8) = /i + E {yU - - E e^str^* f ^) . 

ti fr[ \ St J 

Now, from (5), the one-step forecast for yt in the BIP-ARMA model is given 
by 



(9) yt= ii + ^(t)A yt^i -IX- at-i + ar] i ) ) - E 

which is similar to (8) taking as the cleaned series 

(10) y*t=yt-at + (Trj{at/a). 

The main difference is that here st is taken constant and equal to a. Thus, 
the filtered residuals used by Martin, Samarov and Vandaele [19] and Bianco 
et al. [1] are very similar to those of a BIP-ARMA model. In the next section, 
we will use the model (4) to define robust estimates of the parameters of an 
ARMA model that may contain additive outliers. 

3. Bounded MM-estimates for ARMA models. Assume that yi , . . . , 
are observations corresponding to a BIP-ARMA model and that the den- 
sity of at is f{u). The conditional log-likelihood function of yp+i, . . . ,yn 
given yi, . . . ,yp and the values ap_^_^_^{f3, u) = 0, . . . , ap(/3, a) = 0, where r = 
max(p, q) can be written as 



(11) L,ip,a)= V logfia1iP,a)) 



n 

E 

t=p+i 
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where from (5), the functions a!l{f3,a) are defined recursively for t>p+l 
by 



4 <y) = yt- i^-Y^ 4>i{yt-i - ^) 



+ j2[Mti{/3,'T) + {0i-<Pi)cTv(^ 




In the case of a pure ARMA model, r]{u) = u, (12) reduces to 



p q 



(13) at{P) = yt- fJ^-Y^ (t>i{yt-i - IJ) + Y0iat-i{(3). 



i=l i=l 



Since ML-estimates are not robust, we will consider M-estimates, which 
minimize 



where p is a bounded function and a is an estimate of a. 

We observe that the M-estimates defined in (14) require an estimate a of 
a. This leads us to define in Section 3.2 a two-step procedure for estimating 
(3 that we call MM-estimates. 

3.1. M-estimates of scale. Huber [12] introduced the M-estimates of scale. 
Given a sample u = {ui, . . . , m„), Uj € i?, an M-estimate of scale S'n(u) is de- 
fined by any value s G (0, oo) satisfying 



where p is a function satisfying the following property PI: 

PI: p{0) = 0, p{x) = p{—x), p{x) is continuous, nonconstant and nonde- 
creasing in \x\. 

We can define two asymptotic breakdown points of an M-estimate of scale: 
the minimum fraction of outliers that are required to make the estimate in- 
finity, , and the minimum fraction of inliers that may take the estimate to 
zero, eg. Huber [13] shows that e%^ = h/a and eg = 1 — b/a, where a = maxp. 
Then, the global breakdown point of these estimates is e* = min(e^, 1 — ej^) 
and taking h = a/2, we get a maximum breakdown point of 0.5. To make 
the M-scale estimate consistent for the standard deviation when the data 
are normal, we require that E^{p{x)) = b where $ is the standard normal 
distribution. 



(14) 




(15) 
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3.2. MM- estimates. The MM-estimates for regression were introduced 
by Yohai [25] to combine high breakdown points with high efficiency under 
normal errors. The key idea of the MM-estimates is to compute in the first 
step a highly robust estimate of the error scale, and in the second step this 
scale estimate is used to compute an M-estimate of the regression param- 
eters. For time series models, these two steps are not enough to guarantee 
robustness. This is due to the fact that an outlier in one period not only 
affects the residual corresponding to this period but it may also affect all the 
subsequent residuals. To avoid this propagation we define MM-estimates for 
the ARMA model, where the residuals are computed as in the BIP-ARMA 
model instead as in the regular ARMA model. Then, the procedure for com- 
puting MM-estimates is as follows. 

Step 1. In this step we obtain an estimate of a. For this purpose we 
consider two estimates of a, one using an ARMA model and another using 
a BIP-ARMA model, and choose the smallest one. 

Let pi be a bounded function satisfying PI and such that if 6 = E{pi{u)), 
then 6/max/9i(n) = 0.5. This guarantees that for a normal random sample 
the M-scale estimator s based on pi converges to the standard deviation and 
that the breakdown point of s is 0.5. Put 

Bo,c = {ict>,0)eRP+'^:\z\>l + C 

(16) 

holds for all the roots z of (f>{B) and 6{B)}. 

Let us call Bq = ;Bo,c fo'^ some small C > aiid ^ = ^o,( ^ Then, we define 
an estimate of (3 

(17) ^5 = argmin5„(a„(/3)) 

and the corresponding estimate of a 

(18) s„ = S„(a„(3s)), 

where a„(/3) = (ap+i(/3)), . . . ,an{P) and 5„ is the M-estimate of scale based 
on pi and b. 

Let us describe now the estimate corresponding to the BIP-ARMA model. 
Define = {4>\,e\,p\) by the minimization of S'„(a^(/3, a(^, 0))) over B. 
The value a{cj>,6) is an estimate of a computed as if (0,0) were the true 
parameters and the ot's were normal. Then, since in this case the M-scale a 
coincides with the standard deviation of at, from (4) we have 
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where = vav{r]{at/(j)) and o"^ = var(yj). Let ay be a robust estimate of o". 
and Ii? = var(7/(z)) where z has A^(0, 1) distribution. Then, we define 

(19) a2((/), ^ 



l + ^2^~iA2(0,0)' 
The scale estimate corresponding to the BIP-ARMA model is defined 

by 

(20) 31 = (0|,g|,M^) = argmin5„(a:;(/3,'?(</',0))) 
and 

(21) s'^ = Sn{^'MM^''sM))), 

where a,^(/3,cj) = {ap^i{(3,a), . . . ,a^(/3,cj)). Our estimate of a is 

(22) <=min(5„,,4). 

As we will see in the next section, if the sample is taken from an ARMA 
model without outliers, asymptotically we obtain s„ < s^. We should point 
out that despite the fact that k was computed as if the a^'s were normal, 
the asymptotic properties of the estimators are not going to depend on this 
assumption. 

Step 2. Consider a bounded function p2 such that satisfies PI and p2 < 
pi. This function is chosen so that the corresponding M-estimate is highly 
efficient under normal innovations. Let 

(23) MM = ^tp2('-^ 
and 



(24) M^(/3) = P2 

We define the estimates (3m and by the minimization over B of Mn{(3) 
and M^{f3), respectively. Then, the MM-estimate (3\[ is equal to j3M if 
M„(3m) < M^(3^^) and is equal to 3m if M„(3a/) > M^^0\j). 

For instance, we can take P2{x) = pi{Xx) with < A < 1. If P2{0) > 0, p2 
will be close to a quadratic function when A tends to 0. 

Remark 1. One important problem that will be only briefly mentioned 
here is that of the robust model selection. One possibility to explore is to 
adapt to ARMA models the robust finite prediction error (RFPE) selec- 
tion criterion given in Section 5.12 of Maronna, Martin and Yohai [16] for 
regression. 
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In the next section we will show that when the sample is taken from an 
ARMA model without outliers, for large n the estimate will choose f3\,j = 
(3m- In our Monte Carlo study of Section 7 we observe that if the saniple 
has enough additive outliers we may have = This implies that 
and f3\i have the same asymptotic distribution for any r/. However, the 
efficiency of (3\,j for finite sample size depends on rj. If the interval where r] 
coincides with the identity increases, the efficiency for finite sample size of 
(3\^ will increase as well, but the propagation of the outliers effect will gain 
importance and so the estimate will lose robustness. 

4. Asymptotic results. The main results of this section, stated in The- 
orems 4 and 6, are the consistency and asymptotic normality of the BMM- 
estimators for ARMA models. These theorems require to prove first the 
consistency of S- and the consistency and asymptotic normality of MM- 
estimators. We stated these results in Theorems 1, 3 and 5, respectively. The 
link that relates the properties of S- and MM- to those of BMM-estimates 
are Theorems 2 and 4. 

Consider the following assumptions: 

P2. The process yt is a stationary and invertible ARMA{p,q) process 
with parameter Pq = {4>q, 6q, /xq) € B and £'(log"*" \at\) < oo, where log"*" a = 
max(loga,0). The polynomials (j)o{B) and 9q{B) do not have common roots. 

P3. The innovation at has an absolutely continuous distribution with a 
symmetric and strictly unimodal density. 

P4. P{at G C) < 1 for any compact C. 

P5. The function r] is continuous, even and bounded. 

The following theorem establishes the consistency of the S-estimates based 
on ARMA models. 

Theorem 1. Assume that yt satisfies P2 with innovations at satisfying 
P3. Assume also that pi is bounded and satisfies PI with suppi > b, and 
that tpi = p'l is hounded and continuous. Then, (i) the estimate (3$ defined 
in (17) is strongly consistent for (3q. (ii) Let Sn be the scale estimate defined 
in (18). Then Sn — ^ so where sq is defined by E{pi{at/ sq)) = b. 

The next theorem establishes that under a regular ARMA model (3s and 
(3g are asymptotically equivalent. 

Theorem 2. Assume that yt satisfies condition P2, with innovations at 
satisfying P3 and P4. Assume also that pi is bounded and satisfies PI with 
suppi > b, that ipi = p'l is bounded, continuous and that rj satisfies P5. Then 
if yt is not white noise, with probability 1 there exists no such that = Ps 
for all n>nQ and then s* defined in (22) verifies s* — sq a.s. 



10 



N. MULER, D. PENA AND V. J. YOHAI 



Note that when yt is white noise both models: the regular ARMA and 
the BIP-ARMA, coincide. Then, the assumption that yt is not white noise 
is essential for the validity of the theorem above. 

The following theorems shows the consistency of the MM-estimate. 

Theorem 3. Assume that yt satisfies condition P2, with innovations at 
satisfying P3. Assume also that pi, i = l,2, are bounded and satisfi/ PI, ipi = 
p'^, i = l,2 are bounded and continuous and that sup pi> b. Then — (3q 
a.s. 

The next theorem shows that asymptotically under a regular ARMA 
model Pm and /3^^ are equivalent. 

Theorem 4. Suppose that the assumptions of Theorem 3, P4 and P5 
hold. Then if yt is not white noise, with probability 1 there exists uq such 
that (3\j = 0]\j for all n > no and then I3\,j /3o a.s. 

The following theorem shows the asymptotic normality of the MM-estimates. 



Theorem 5. Suppose that the assumptions of Theorem 3 hold. More- 
over, assume that ip2 and -02 ^'"c continuous and bounded functions and 
(7^ = E(at) < oo. Then we have 



where 



D 



slEFMiat/so)) (a-^C-^ 



2 > 



E%m(^t/s,)) V Co 

(25) 

^° 1 - Eti% 

and C = {cij) is the (p + q + l) x {p + q + 1) matrix given by 

oo 

= X! VkVk+j-i ifi<j<p, 

k=0 



Cp+i,p+j 



vDkVDk+j^i ifi<j<- 



k=0 



YiUkVk+j-i ifi<p,j<q,i<j, 



k=0 

oo 



■^Vk^k+i-j ifi<pj<q,j<i, 

k=0 
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where (po'^{B) = 1 + J2°Z^ ViB' and 9o^iB) = 1 + J2Zi ^i^'- Observe that 
when the at 's are normal, 

Remark 2. When P2{u) = u'^, (3m is the conditional maximum hkeH- 
hood estimate corresponding to normal errors. Let Fq be the distribution of 
at, then, in this case SQEpg{ip2{at/ sq))/ EF^'^{ip2{at/so)) =c^- Therefore, the 
asymptotic efficiency of the MM-estimate with respect to the normal condi- 
tional maximum likelihood estimate when the innovations have distribution 
Fo is 



(26) EFF(V^2,i^o 



Choosing '02 conveniently we can make this efficiency as close to one as 
desired for the case that Fq is normal. 

Remark 3. The relative efficiency of the MM- and BMM- estimates 
given by (26) is the same as the one of the M-estimates of location with 
respect to the mean. This implies the well-known fact that M-estimates are 
robust for innovation outliers, that is, when < t < n, correspond to a 
perfectly observed ARMA model but the distribution Fq of at is heavy tailed. 

Remark 4. When E{a1) = oo, the rate of convergence of M-estimates 
of 4> and may be larger than and the asymptotic distribution non- 

normal. This case was studied by several authors. See, for example, Davis, 
Knight and Liu [7] and Davis [8]. 

Remark 5. Theorems 1-5 use P3 only to guarantee that for all a, the 
function g{^) = E{p{{at — n)/cr) has a unique minimum at = 0. If g{ii) 
has a unique minimum at 7^ 7^ 0, then the estimates of and are still 
consistent, but the estimate of fi will converge to /xq +'p. 

Finally, from Theorems 4 and 5 we derive the following theorem: 

Theorem 6. Suppose the assumptions of Theorem 5, P4 and P5 hold. 
Then {n—p)^/'^((3%j — f3Q) converges in distribution to aN{0,D) distribution, 
where D is defined in (25). 

Note that the assumptions of Theorems 2 and 4 include condition P4. 
However, this condition is not strictly necessary and is included only to 
simplify the proofs. 

All the asymptotic theorems of this section assume that the process is an 
ARMA model. We conjecture that similar results, consistency and asymp- 
totic normality hold when the observations follows a BIP-ARMA model. 
The main difficulty to prove these results is to show that the distribution of 
Oj(/3,(t) is asymptotically stationary. 
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5. Computation. We will discuss here how to compute the MM-estimate. 
We start computing the estimates of Step 1, (3$ and According to (15), 

=P+i '• 



we can write 5^(a„(/3)) = I]r=n+i where 



(a\ Sni&niP)) 1/2 ( ati(3) 



(n-p)i/26i/2^i V5„(a„(/3)) 

Then to compute Ps we can use any nonlinear least squares algorithm, for 
example, a Marquard algorithm. Similarly we can transform the minimiza- 
tion of S'„(a^(/3, 0))) in a nonlinear least squares problem. Note that 
nonlinear least squares algorithms require a good starting point. Since the 
functions we are minimizing are nonconvex and they may have several local 
minima, the choice of the starting point is crucial. 

If the model has few parameters (e.g., p + q <3), one way to obtain the 
starting point is to generate a grid of values of the parameter and choose 
as initial estimate the one minimizing the objective function. Note that the 
case of p + q <3 is very frequent in the case of ARM A applications, where 
the use of parsimonious models is recommended. Bianco et al. [1] gave an 
algorithm to compute a highly robust starting point when there are more 
parameters. 

In the second step, to compute f^M and f3\i we can use Marquard algo- 
rithm using a similar idea and taking as initial estimate the best estimate 
computed in Step 1. 

In our simulations the estimates were defined taking 

r 0.5x2, if|a;j<2, 
P2ix) = I 0.002x8 _ 0.052x6 + 0.432x'' - 0.972x2 + 1.792, if 2 < |x| < 3, 
[3.25, if|x|>3, 

pi{x) = p2{x/0A05) and rj = p'2. Note that pi and p2 are smooth functions 
that are quadratic in the intervals (—0.81,0.81) and (—2,2), respectively. 
The function pi was chosen so that if we take 6 = max/9i/2 then the scale 
is consistent to the standard deviation for normal samples. Note that 77 is a 
redescending function. 

For fitting an ARMA(1, 1) model to 1000 observations using a MATLAB 
program, with an initial solution computed with a grid of 20 values in each 
parameter, the computing time of a BMM-estimate with the choices of pi,i = 
1,2, and r] given above is approximately 10 seconds in a personal computer 
with an AMD Athlon 1.8 GHz processor. For fitting an AR(3) model under 
the same conditions, the computing time is 1 minute 20 seconds. 



6. Robustness properties. Several robustness measures can be used for 
estimates of time-series parameters. Hampel [11] introduced the infiuence 
curve to measure the robustness of an estimate under an infinitesimal outlier 
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contamination in the framework of i.i.d. observations. Kiinscli [14] , Martin 
and Yohai [17] and Mancini, Ronchetti and Trojani [15] give generalizations 
of the influence curve for estimating time-series parameters. However, be- 
cause of its infinitesimal character, the influence curve may not be a good 
measure of the robustness when there is a positive fraction of outlier con- 
tamination. For example, it can be proved that for a very small amount of 
contamination the MM- and BMM-estimates asymptotically coincide and 
therefore their influence curves also coincide. However, we will see below in 
this section and in Section 7 that the BMM-estimate is more robust than 
the MM-estimate. Influence functions for the M-estimates of ARMA models 
can be found in Martin and Yohai [17]. 

A more reliable measure of the robustness of an estimate to cope with 
a positive fraction e of contamination is the asymptotic maximum bias. 
Consider a family of e-contaminated process 



as in (3) where k ^ K and {xt,Ct^ '^t ) stationary. Suppose also the distribu- 
tion of the uncontaminated process Xf depends on a parameter 7 S F C i?-' . 
As an example, we can consider the family of additive outliers models, which 
is obtained taking = xt + k, with k (z R. 

Suppose that for each sample size n we have an estimate 7„ of 7 and let 
7oo(L) = ijoc,i{L), . . . ,7ooj(i)) be the almost sure limit of 7„ when applied 
to a process with distribution L. The bias of the ith component 700 when 



(27) 




(l-Cf)rEt + Cf^f 








2 



4 



8 



10 



Fig. 1. Bias curve for the AR(1) model with (f> = 0.5 and 10% of additive outliers where 
k is the outlier size. 
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n 1 1 1 1 1 — 

10 20 30 40 50 

percentage of contamination: 

Fig. 2. Maximum bias for the AR(1) model with = 0.5. 
applied to zf^ as defined in (27) is 

where C{z^^) denotes the distribution of the process zf^. The maximum 
asymptotic bias of the ith. component is defined by 

M5(7ooj,7,e) = sup5(7ooj,7>^>^)- 

We have approximately computed the maximum bias curves of the MM- 
and BMM-estimates for Gaussian AR(1) and MA(1) models with additive 
outliers {w^ = Xt + k) and where the Q are i.i.d. Bernoulli variables. To 
simplify the computation we eliminate the intercept from these models by 
assuming it to be known and null. The asymptotic value of the estimate is 
approximated using samples of size 10000. We found that for samples size 
larger than 10000 the changes in the estimate are negligible. 

In Figure 1, we show the bias curves of the MM- and BMM-estimates 
for the AR(1) model with (j) = 0.5 and e = 0.1. In Figure 2, we show the 
maximum biases curves for the MM- and BMM-estimates under the same 
model. In Figure 3, we show the maximum bias curve for the BMM-estimate 
under a MA(1) model with parameter 9 = —0.5. 

In both cases, we observe that the BMM-estimate has a smaller maximum 
bias than the MM-estimate. We also observe that the behavior of the MM- 
is different from the BMM-estimate. After the contamination is larger than 
some value e* the maximum bias of the MM-estimate is constantly equal 
to the value of the estimated parameter. This means that the asymptotic 
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value of the estimate becomes independently of the true value of the pa- 
rameter. This value e* corresponds to the breakdown point notion proposed 
by Genton and Lucas [10]. For the AR(1) model the value e* depends on 
(p. For the MA(1) model e* = 0. Instead, the behavior of the BMM-estimate 
is different and apparently the estimate does not break down. A surprising 
feature of its maximum bias curve is that for very large e the maximum 
bias starts decreasing. This can be explained as follows: when e is large, the 
probability of obtaining a patch of two or more outliers increases. The effect 
of a patch of outliers is to increase the correlation of the series. Therefore, in 
the case of the AR(1) model with (j) positive and MA(1) with negative it 
prevents that the parameter further approximates to zero for outliers with 
fixed size k. We also computed the maximum bias curves for other values of 
parameters (j) and 9 and the results were similar. 

We conjecture that the robust behavior of the BMM-estimate under ad- 
ditive outlier contamination also holds when we observe any contaminated 
process zf as given in (3). The reason is that since the BIP-ARMA model 
includes a built-in filtering to compute the residuals, a small fraction of out- 
liers will affect only a small fraction of residuals. Therefore, since the loss 
function of the BMM-estimate is bounded, the estimate will not be largely 
affected by a small fraction of large residuals. We compute maximum bias 
curves for the case of replacement outliers {w^ = k), obtaining similar results 
as for the case of additive outliers. 

7. A Monte Carlo study. We have performed a Monte Carlo study to 
compare several estimates for ARMA models. We have simulated three 



MM 
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Gaussian stationary ARMA models considering the case that the series do 
not contain outliers and the case that the series have 10% of equally spaced 
in time additive outliers. The size of the additive outliers was taken equal 
to 4 and 6. The sample size in the simulations is 200 and the Monte Carlo 
study was done with 500 replications. 

The estimates considered in this study are (i) the normal conditional max- 
imum likelihood estimate (MLE), (ii) an MM-estimate where the residuals 
are computed as in a regular ARMA model (MM), (iii) an MM-estimate 
where the residuals are compared with the ones of a BIP-ARMA model 
(BMM), (iv) an estimate based on the diagnostic procedure described in 
Chen and Liu [5]. The cutoff point for outlier rejection is chosen by the 
Bonferroni inequality as c = <I>~^(1 — (0.05/n)), where <1> is the A^(0, 1) dis- 
tribution function. We denote this estimate by (CTC^)- (v) The same as 
in (iv) but the cutoff point is c = 3 (CTC3). (vi) The tau filtered estimate 
proposed by Bianco et al. [1]. We denote this estimate by (FTAU). 

The estimates MM and BMM are based on the functions pi and p2 and rj 
described in Section 5. We have simulated three models: AR(1) with (j) = 0.5, 
MA(1) with 9 = -0.5 and ARMA(1, 1) with (p = 0.5 and 6 = -0.5. The value 
of /i was zero for all these models. However, since the estimates are shift 
equivariant, their behavior does not depend on the value of this parameter. 
For brevity's sake, we only show here (in Tables 1 and 2) the results for the 
AR(1) and the MA(1) model. The results for the ARMA(1, 1) models are 
similar and can be found in Muler, Peha and Yohai [21]. 

The relative efficiency with respect to the MLE when there are no outliers 
varies in the case of (j) and 9 from 80% to 86% for the estimate BMM, from 
80% to 91% for the estimate MM, is almost 100% for the CTC estimates and 
varies from 65% to 67% for the FTAU. The efficiency of all the estimates of 
is very high. We also observe that under additive outlier contamination, the 
estimate BMM of <j) and 9 behaves much better than those corresponding 
to the estimates MM, CTCb and CTC3. The performance of the estimates 
FTAU and BMM are comparable. 

The errors of the MSEs shown on Tables 1 and 2 are smaller than 15% 
with probability 0.95. However, since all the estimates were computed with 
the same samples, the errors of the differences between the MSEs of any two 
estimates are much smaller, thus making comparisons possible. 

8. An example. This example deals with a monthly series of inward 
movement of residential telephone extensions in a fixed geographic area from 
January 1966 to May 1973 (RESEX). The series was analyzed by Brubacher 
[2] and by Martin, Samarov and Vandaele [19], who identified an AR(2) 
model for the differenced series yt = xt — xt-12, where xt is the observed 
series. 
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Table 1 

MSE for the AR(1) model with (fi = 0.5 without outliers and with 10% of equally spaced 

additive outliers 



Estimate 


No outliers 


Out. size 4 


Out. 


size 6 




4> = 0.5 




4> = 0.5 




4> = 0.5 


MLE 


0.017 


0.0036 


0.189 


0.103 


0.394 


0.189 


MM 


0.018 


0.0045 


0.024 


0.085 


0.028 


0.132 


BMM 


0.018 


0.0042 


0.021 


0.014 


0.019 


0.0048 


CTCb 


0.017 


0.0036 


0.185 


0.103 


0.364 


0.189 


CTCa 


0.017 


0.0036 


0.148 


0.096 


0.057 


0.047 


FTAU 


0.019 


0.0054 


0.032 


0.011 


0.028 


0.0076 



Table 3 displays the value of the estimates MLE, MM, BMM, CTC3 and 
the FTAU together with the MAD-scale of the residuals. We can see that 
the estimated values of the parameters of the MLE and the CTC3 are quite 
different from the robust estimates MM, BMM and FTAU. The estimate 
CTCb gives the same result as CTC3 (it detects the same outliers) and is 
omitted from the table. 

Table 2 

MSE for the MA(1) model with 9 = —0.5 without outliers and with 10% of equally spaced 

additive outliers 



No outliers Out. size 4 Out. size 6 



Estimate 




e 




e 




e 


MLE 


0.010 


0.0042 


0.178 


0.128 


0.380 


0.215 


MM 


0.012 


0.0046 


0.015 


0.115 


0.016 


0.159 


BMM 


0.012 


0.0052 


0.015 


0.025 


0.012 


0.0065 


CTCb 


0.011 


0.0042 


0.174 


0.130 


0.345 


0.218 


CTC3 


0.011 


0.0044 


0.136 


0.125 


0.042 


0.064 


FTAU 


0.012 


0.0065 


0.020 


0.031 


0.017 


0.021 



Table 3 

Estimates of the parameters of the RESX series 



Estimates 


M 


01 


4>2 


MAD 


MLE 


2.69 


0.48 


-0.17 


1.70 


MM 


1.18 


0.34 


0.31 


1.43 


BMM 


1.74 


0.42 


0.36 


1.24 


CTC3 


3.44 


1.14 


-0.74 


1.86 


FTAU 


1.71 


0.27 


0.49 


1.10 
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Figure 4 shows the data yt obtained differentiating the observed data as 
yt = xt — Xt-i2 and the cleaned values as in (10), which are seen to be almost 
coincident except at outlier locations. 

9. Concluding remarks. We have presented two families of estimates for 
ARM A models: MM-estimates and BMM-estimates. The BMM-estimates 
use a mechanism that avoids the propagation of the full effect of the out- 
liers to the subsequent residual innovations. To make this mechanism com- 
patible with consistency when the true model is ARMA, we consider two 
estimates: one is obtained fitting a regular ARMA model and the other fit- 
ting a BIP-ARMA model, where the propagation of the effect of outliers 
is bounded. Then, the estimate that fits better to the data is selected. We 
have shown in Sections 6 and 7 that, at least for additive outliers, the BMM- 
estimates are much more robust than the MM-estimates and quite compa- 
rable with the FTAU-estimates. The main advantage of the BMM-estimates 
over the FTAU-estimates is that an asymptotic theory is now available and 
this makes inference with BMM-estimates possible. The Monte Carlo results 
of Section 7 also show that the BMM-estimate compares favorably with the 
estimate based on the Chen and Liu [5] diagnostic procedure. 

APPENDIX 

Suppose that we have the infinite sequence of observations Yt = (. . . , yt-k, 
. . . ,yt-i,yt) generated by a stationary and invertible ARMA(p, process 
up to time t with parameter Pq. Given any /3 = {cf),0,^) such that the 



8 - 




Fig. 4. 



Differenced RESEX Series: observed (solid line) and filtered (dots) values. 
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polynomials (piB) and 0(B) have all the roots outside the unit circle, let 
us define af{(3) = 9~^{B)(j){B){yt — j-t). Then af (/3o) = at and af (/3)'s satisfy 
the following recursive relationship 

p q 

i=l i=l 

In the case that at has finite first moment, we have that at{f3) = yt — 
E{yt\Yt-i), where the conditional expectation is taken assuming that the 
true value of the parameter vector is f3. 

We will use the following notation. Given a function g{vL) -.R^^R, we 
define V(7(u) as the column vector of dimension k whose ith element is 
Vig{\i) = dg{u)/dui and \/'^g{u) is the k x k matrix whose element is 
Vfjg{u) = d'^g{u)/dui duj. 

The next lemma proved in Lemma 2 of Muler, Pefia and Yohai [21] gives 
the Fisher Consistency of the S-estimate when we have all the past obser- 
vations. 

Lemma 1. Assume that yt satisfies condition P2 with innovations sat- 
isfying P3. Assume that pi is a bounded function satisfying condition PI, 
define s{f3) 6y £'(pi(af (/9)/s(/3))) = 6. Then, if (3 ^ B and (3 ^ (3q we have 
so = s{(3o) <s{(3). 

The proofs of the next two lemmas can be found in Lemmas 5 and 6 of 
Muler, Peha and Yohai [21]. 

Lemma 2. Under the assumptions of Theorem 1, for any d> we have, 
lim sup \Sn{sLn{(3)) — s{(3)\ = a.s. 

Lemma 3. Under the assumptions of Theorem 1, there exists d> sat- 
isfying 

lim inf inf 5„(a„(/3)) > sq + 1 a.s. 

Proof of Theorem 1 . Take e > arbitrarily small and let d be as in 
Lemma 3. By the dominated convergence theorem it is easy to show that 
s{(3) is continuous. Then by Lemma 1, there exists < 7 < 1 such that 

min s(3) > sq + 7- 

/3gBox[-d,d],||/3-/3g||>£ 

Therefore by Lemma 2, there exist ni such that for n>ni 

min 5'n(/9) > So + 7/2 

/3eBoxM,d],||/3-/3oll>e 
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and SniPo) •so + 7/4- Moreover by Lemma 3, there exists n2 such that for 
n > 712 

inf S'„(a„(/3)) > sq + 7 a.s. 

|M|>d,(0,0)GBo 

Therefore, for n > max(ni,n2) it holds that \\f3s ~ f^oW < ^ ^-iid this proves 
the theorem. □ 

The next three lemmas will be used to prove Theorem 2. 
The proof of the next two lemmas can be found in Lemmas 7 and 8 of 
Muler, Peha and Yohai [21]. 

Lemma 4. Assume that yt satisfies condition P2. Given d> and a > 0, 
there exist constants C > and < < 1 such that 

sup sup^|ai(/3,cj) -ytl <C a + , t>p + l. 

/3GBox[-d,d]o<cr<o- V i=l J 

Lemma 5. Under the assumptions of Theorem 2, given d> 0, there ex- 
ists (5 > such that 

liminf inf SnistiiB.aiS.O))) > sn + 5 a.s. 

n^oo fieBoy.[-d,d] 

The proof of the next lemma can be found in Lemma 9 of Muler, Peha 
and Yohai [21]. 

Lemma 6. Under the assumptions of Theorem 2, there exists d> such 
that, 

liminf inf Sn(aiL(d,a(<b,0))) > sq + 1 a.s. 

n->oo|At|>rf,(0,0)Gi3,) 

Proof of Theorem 2. From Lemmas 5 and 6 we have that there exists 
S > such that 

lim inf inf 5.„(a^(/3, 0))) > sq + ^ a-s. 

But, by Theorem l(ii) we have that Ps satisfies lim„^oo Sni^nif^s)) = sq 
a.s. This proves the theorem. □ 

The following four lemmas will be used to prove Theorem 3. The proofs 
can be found in Lemmas 10, 11, 12 and 13 of Muler, Peha and Yohai [21]. 

Lemma 7. Assume that yt satisfies condition P2 with innovations sat- 
isfying P3 and assume that p2 satisfies condition PI with p2 bounded. Let 
us call m{f3) = E{p2{al{f3)/ sq)), then /3q = argmin^ggm(/3). 
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Lemma 8. Assume that yt satisfies condition P2 and p2 condition PI. 
Define 



(28) 

Then, we have, 



Mm 



1 



n — p 



t=p+l 



«f(/3) 



lim 



sup 

' (3eBox[-d,d] 



am 

So 



a.s. 



for all d>0. 



Lemma 9. Under the assumptions of Theorem 3, we have 

lim sup |M„(/3) -M^(/3)| =0 a.s. 
f3GBox[-d,d] 

Lemma 10. Under the assumptions of Theorem 3, there exists d> and 
6 > such that 

lim inf inf MJl3)>m(dn) + 6 a.s., 

where m(/3o) is defined in Lemma 7. 

Proof of Theorem 3. Follows from Lemmas 8-10 using similar argu- 
ments as those used in the proof of Theorem 1. □ 

The next two lemmas will be used to prove Theorem 4. 

Lemma 11. Under the assumptions of Theorem 3, for all d > there 
exists 6 > such that 

lim inf inf M^(/3) > m(/3o) + 5 a.s., 
where m(/3o) is defined in Lemma 7. 

Proof. It is similar to the proof of Lemma 5. □ 

The proof of the next lemma can be found in Lemma 15 of Muler, Peha 
and Yohai [21]. 

Lemma 12. Under the assumptions of Theorem 3, there exists d> and 
5 > such that 

lim inf inf M^(3)>m(l3n) + 6 a.s., 
where m(/3o) is defined as in Lemma 7. 
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Proof of Theorem 4. Prom Lemmas 11 and 12 we have that there 
exists (5 > such that liminf^_>oo 

mf^gB^^(/3) > m{l3Q) + 5. Theorem 3 
implies that hm„^oo -^n(/3Af) =i^{Po) a.s. This proves the theorem. □ 

The next five lemmas will be used to prove Theorem 5. 



Lemma 13. Under the assumptions of Theorem 5, we have 



1 



Tj2 E 



t=p+l 



So 



where 
(29) 



Vo = e(Vp2 



So 



Vp2 



N{0,Vo), 



So 



Proof. This lemma follows immediately from the Central Limit The- 
orem for Martingales (see Theorem 24.3, Davidson [6]). Por details, see 
Lemma 16 of Muler, Peha and Yohai [21]. □ 

Lemma 14. Under the assumptions of Theorem 5 we have 



n— >oo (77, — p)l/2 



t=p+l 



Vp2 



So 



in probability. 



Proof. The proof is similar to the one of Lemma 5.1 in Yohai [24] for 
MM-estimates in the case of regression. The details can be seen in Lemma 
17 of Muler, Peha and Yohai [21]. □ 

The proof of the next lemma can be found in Lemma 18 of Muler, Peha 
and Yohai [21]. 



Lemma 15. Under the assumptions of Theorem 5, we have 



E 

t=p+i ^ 



atiPo) 



Vp2 



afiPo) 



a.s. 



The proof of the next lemma can be found in Lemma 19 of Muler, Peha 
and Yohai [21]. 



Lemma 16. 



Under the assumptions of Theorem 5 we have for all d> 0, 
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(i) 



lim sup 

f3£Box[-d,d] 



1 

n — p 



t=p+i 



«f(/3) 



P2 



am 

So 





a.s., 



where 11 All denotes the I2 norm of the matrix A. 
(ii) 

e( V^p4^]) = -Li?(^^(^))i?(Va?(/3o)Va?(/3o)') 



So 



and this matrix is nonsingular. 

The proof of the next lemma can be found in Lemma 20 of Muler, Peha 
and Yohai [21]. 

Lemma 17. Under the assumptions of Theorem 5, we have, 



1 



hm sup 

''^°°l3eBox[~d,d]'n-p 

for all d>0. 



t=p+i 



(am 



VV2 



a.s. 



Proof of Theorem 5. The estimate 3m satisfies Ya=p+i Vp2(at(y§M)/ 
s* ) = 0. Then, using the Mean Value Theorem we have 



(30) E Vp4^ + E VVJ^ (3m-/3o) = 0, 

where /3* is an intermediate point between f3M and (3^. 

From Theorem 3 we have that (^m Po Take d > so that d > |/Uo|, 
then with probability one there exists no such that /3j\/ € /Bq x [—d, d] for all 
n > riQ. From Lemmas 16(i) and 17 we get 



(31) 

Put 
(32) 



lim sup 



1 



n — p 



t=p+i 



am 



So 



a.s. 



Ar. 



1 

n — p 



E 



t=p+i 



24 N. MULER, D. PENA AND V. J. YOHAI 

Then, since (3* /3o a-S- and E(S/'^ p2{af{P)/sQ)) is continuous in /3 we have 
that 

(33) jirn^„ = i?(^VV2(^^)) a.s. 

Therefore from Lemma 16 (ii), for n large enough An is nonsingular. Then, 
from (30) for large enough n we have (n — p)^^'^{I3m — Po) = A~^Cn, where 

From Lemmas 13, 14 and 15 we have that c„ — A^(0, Vq) and then 

from (33), and we get {n-pf/^^M - Pq) ->d iV(0, Ff VoFf ^), where 

yi=£;(VV2(af(/3o)/so)). 

In Theorem 5 of Muler, Peha and Yohai [21] it is proved that 

(34) VpJ^U^^^^v,, 



So / So 

where is the stationary process vector of dimension (p + (? + 1) defined by 
{-ct)^\B)at-j, ifl<i<p, 
v*i = \ Oo^{B)at-j-p, if p + 1 < j < p + g, 

I Co, ifi=p + g + i, 

where Co = -(1 - E?=i '/'Oi)/(l - E^=i S)- Then 

(35) i.(vp.(^) Vp.(^^)') = ^m^Ei.,.',). 
Differentiating V/3(af (/3)/so) we obtain 

(36) VV^f^) =^^^(^)v,v; + l^,(^) VV(/3o). 

V So / \sqJ So Vsq/ 

Since V^af (/^q) is independent of at and E{ip2{at/ sq)) = we have E{ip2{at/ 
so)V2af (/3o)) = 0, and then from (36), since at and vt are independent we 
get 



(37) 



Hence, from (35) and (37) we obtain 



Finally, it is straightforward (see, e.g., Bustos and Yohai [3]) to show that 



a 

where C is defined in the statement of Theorem 5 and a\ = E{a^). □ 
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